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Abstract 

We consider a class of spatio-temporal models which extend popular econometric spatial 
autoregressive panel data models by allowing the scalar coefficients for each location (or panel) 
different from each other. To overcome the innate endogeneity, we propose a generalized 
Yule-Walker estimation method which applies the least squares estimation to a Yule-Walker 
equation. The asymptotic theory is developed under the setting that both the sample size and 
the number of locations (or panels) tend to infinity under a general setting for stationary and 
a-mixing processes, which includes spatial autoregressive panel data models driven by i.i.d. 
innovations as special cases. The proposed methods are illustrated using both simulated and 
real data. 
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1 Introduction 


The class of spatial autoregressive (SAR) models is introduced to model cross sectional depen¬ 
dence of different economic individuals at different locations (Cliff and Ord, 1973). More recent 
developments extend SAR models to spatial dynamic panel data (SDPD) models, i.e. adding time 
lagged terms to account for serial correlations across different locations. See, e.g. Lee and Yu 
(2010a). Baltagi et al. (2003) considers a static spatial panel model where the error term is a SAR 
model. Lin and Lee (2010) shows that in the presence of heteroskedastic disturbances, the maxi¬ 
mum likelihood estimator for the SAR models without taking into account the heteroskedasticity 
is generally inconsistent and proposes an alternative GMM estimation method. Computationally 
the GMM methods are more efficient than the QML estimation (Lee, 2001). Lee and Yu (2010a) 
classifies SDPD models into three categories: stable, spatial cointegration and explosive cases. 
As pointed out by Bai and Shi (2011), the cases with a large number of cross sectional units and 
a long history are rare. Hence it is pertinent to consider the setting with short time spans in 
order to include as many locations as possible. Both estimation method and asymptotic analysis 
need to be adapted under this new setting. Yu et al. (2008) and Yu et al. (2012) investigate the 
asymptotic properties when both the number of locations and the length of time series tend to 
infinity for both the stable case and spatial cointegration case, and show that QMLE is consistent. 

Motivated by the evidence in some practical examples, we extend the model in Yu et al. (2008) 
and Yu et al. (2012) by allowing the scalar coefficients for each location (or panel) different from 
each other. This increase in model capacity comes with the cost of estimating substantially more 
parameters. In fact that the number of the parameters in this new setting is in the order of 
the number of locations. The model considered in this paper has four additive components: a 
pure spatial effect, a pure dynamic effect, a time-lagged spatial effect and a white noise. Due to 
the innate endogeneity, the conventional regression estimation methods such as the least squares 
method directly based on the model lead to inconsistent estimators. To overcome the difficulties 
caused by the endogeneity, we propose a generalized Yule-Walker type estimator for estimating the 
parameters in the model, which applies the least squares estimation to a Yule-Walker equation. 
The asymptotic normality of the proposed estimators is established under the setting that both 
the sample size n and the number of locations (or panels) p tend to infinity. Therefore the number 
of parameters to be estimated also diverges to infinity, which is a marked difference from, e.g., Yu 
et al. (2012). We develop the asymptotic properties under a general setting for stationary and 
a-mixing processes, which includes the spatial autoregressive panel data models driven by i.i.d. 
innovations as special cases. 

The rest of the paper is organized as follows. Section 2 introduces the new model, its mo¬ 
tivation and the generalized Yule-Walker estimation method. The asymptotic theory for the 
proposed estimation method is presented in Section 3. Simulation results and real data analy¬ 
sis are reported, respectively, in Section 4 and 5. All the technical proofs are relegated to an 
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Appendix. 


2 Model and Estimation Method 

2.1 Models 

The model considered in this paper is of the following form: 

y t = D(\ 0 )Wy t + D{ Ai)y t _i + D(A 2 )Wy,_i + e t , (1) 

where y t = ■ ■ ■, Up,t) T represents the observations from p locations at time t, D(X k ) = 

diag(Afci,..., X kp ) and X k j is the unknown coefficient parameter for the j-th location, and W is 
the pxp spatial weight matrix which measures the dependence among different locations. All the 
main diagonal elements of W are zero. It is a common practice in spatial econometrics to assume 
W known. For example, we may let wij = 1/(1 + dij), for i ^ j, where dij > 0 is an appropriate 
distance between the i-th and the j-th location. It can simply be the geographical distance between 
the two locations or the distance reflecting the correlation or association between the variables 
at the two locations. In the above model, D( Ao) captures the pure spatial effect, -D(Ai) captures 
the pure dynamic effect, and D(X 2 ) captures the time-lagged spatial effect. We also assume that 
the error term e t = (eqt, £ 2 ,t, ■ ■ ■, £p,t) T in CD) satisfies the condition Cov(yt_i,£t) = 0. When 
Afci = ■ ■ ■ = \ kp for k = 0,1,2, (JTJ) reduces to the model of Yu et al. (2008), in which there 
are only 3 unknown regressive coefficient parameters. In general the regression function in (pQ) 
contains 3 p unknown parameters. 

The extension to use different scalar coefficients for different locations is motivated by practical 
needs. For example, we analyze the monthly change rates of the consumer price index (CPI) for 
the EU member states over the years 2003-2010. The detailed analysis for this data set will 
be presented in section [5l Figure [T] presents the scatter-plots of the observed data y l .t versus 
the spatial regressor wfy t and y^t- 1 , for some of the EU member states, where wf is the z-th 
row vector of the weight matrix W which is taken as the sample correlation matrix with all 
the elements on the main diagonal set to be 0. The superimposed straight lines are the simple 
regression lines estimated using the newly proposed method in Section 2.2 below. It is clear from 
Figure [T] that at least Greece and Belgium should have a different slope from those of France or 
Iceland. 

2.2 Generalized Yule-Walker estimation 

As y t occurs on both sides of (fTj). Wy t and et are correlated with each other. Applying least 
squares method directly based on regressing y t on (Wyt, yt-i, Wyt_i) leads to inconsistent es¬ 
timators. On the other hand, applying the maximum likelihood estimation requires to profile a 
pxp nuisance parameter matrix Elg = Var(et), which leads to a complex nonlinear optimization 
problem. Furthermore when p is large in relation to n, the numerical stability is of concern. 
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Figure 1: Plots of the monthly change rates y t .t of CPI against the spatial regressor wj y t (on the 
top) and the dynamic regressor yij-i (on the bottom) for four EU member states in 2003-2010. 
The superimposed straight lines were estimated by the newly proposed method in Section 2.2. 


We propose below a new estimation method which applies the least squares method to each 
individual row of a Yule-Walker equation. To this end, let S/, ; = Cov(yt_|_fc, yt) for any k > 0. 
Note that we always assume that yt is stationary, see condition A2 and Remark 1 in Section [3] 
below. Then the Yule-Walker equation below follows from dTJ directly. 

(I - D{ Ao)W)Si = (D(X l ) + T>(A 2 )W)S 0 , 

where I is a p x p identity matrix. The z-th row of the above equation is 

(ef - A 0i w,f)Si = (A H ef + A 2 iwf)S 0 , i = l,...,p, (2) 

where w; is the z-th row vector of W, and e* is the unit vector with the z-th element equal to 1. 
Note that (J2J) is a system of p linear equations with three unknown parameters Ao i, Xu and A 2 ,. 
Since Ey t = 0, we replace Si and So by the sample (auto)covariance matrices 

-t n 1 71 

Si = - ytYt-i and S 0 = - ^ y t yt- 

t =l t=i 

We estimate (Ao«, An, A 2 j) T by the least squares method, i.e. to solve the minimization problem 

^ 'J 1 _ 

min ||S 1 (ej - A 0 jWj) - S 0 (A li e i + A 2i Wj)|||. 

•AO i j'Ali j-A2i 

The resulting estimators are called generalized Yule-Walker estimators which admits the explicit 
expression: 

(Aoi, An,A 2 ;) T = (XfX^XfYt, (3) 
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where 


Xj = (S x Wj, S 0 ej, S 0 Wj) and Y t = S, e t . 


More explicitly, 

/ ^ n i n i n \ ^ n 

Xj = - y'yt-i(wfyt), - YVt-iZ/i.t-i, - Vyi-i(wfyt_i) , Yi = -Vy f _iy* 

\ n z ^ r? z ^ r? z —^ / n z ^ 


\ t=l t=l 

Then it holds that for i = 1, • • • , p, 


t= l 


t=i 


^ Aoj ^ 


^ Aoi ^ 


( l EL 1 yf-i( w fyf) x l ELi awt-i ^ 

Aij 

— 

Aii 

= (Xfx,)' 1 

1 T 1 x“^n 

n 2^i=1 yj-ldpt-l x n /-jt= 1 £ i,tyt-l 

V / 


\ / 


1 x~^n T ( T \ . 1 

\ sE=iy t -iKyt-i)x-Lt = HMyi-i / 


2.3 A root-n consistent estimator for large p 

When p/y/n —> oo, the estimator Q admits non-standard convergence rates (i.e. the rates 
different from y/n)\ see Theorems 2 and 4 in Section [3] below. Note that there are p equations 
with only 3 parameters in (J2|). Hence ([3]) can be viewed as a GMME for an over-determined 
scenario. The estimation may suffer when the number of estimation equations increases. See, for 
example, a similar result in Theorem 1 of Chang, Chen and Chen (2015). A further compounding 
factor is that the estimation for the covariance matrices So, Si using their sample counterparts 
leads to non-negligible errors even when n —>• oo. Below we propose an alternative estimator which 
restricts the number of the estimation equations to be used in order to restore the -^/n-consistency 
and the asymptotic normality. 

For i = 1, ■ ■ ■ ,p, put X ? ; = (S^w.j, Soe*, Sow,;). Note that the k- th row of Xj is (e^S^Wj, 
e^S 0 ej,e^S 0 Wj) which is the covariance between y k) t -1 and (wfy t , yi,t- 1 , wfyt-i). Let 

Pk = | e fcXfwj| + |efcS 0 ej| + |efcS 0 Wj| , k = l,---,p. (4) 

Then p k may be viewed as a measure for the correlation between yk,t-\ and (wf y t . y^t-i, wJyt-i) T 
When p£ is small, say, close to 0, the k- th equation in © carries little information on (Aoj, Aij, A 2 j). 
Therefore as far as the estimation for (Aoi, Aij, A 2 j) is concerned, we only keep the fc-th equation 
in ([2]) for large p k . 

Let be the dj x 1 vector consisting of those yk,t -1 corresponding to the dj largest p> k ' 
{l < k < p ), where 'p k is defined as in (HJ) but with (Si, So) replaced by (Si, So). The new 
estimator is defined as 


(A 0 j, Ah, A 2 j) r = (ZfZj) 1 Zf Yj, i = ,p. 


( 5 ) 


where 


Zj = 


n 


-i( w t yt). 


t =1 


n 


t =1 




t= l 


( 6 ) 
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and 


Now it holds that 



n 

t =1 


^ Aoi ^ 


^ Ao i ^ 


\ 

Aii 

- 

Ai i 

= (zfzo^zf 

n St=l £ i,t z t -1 

V / 


\ A2i / 


V n St=l £ i,t z t -1 / 


Theorem [3] in Section 3 below shows the asymptotic normality of the above estimator provided 
that the number of estimation equations used satisfies condition di = o(y / n). 


3 Theoretical properties 

We introduce some notations first. For a p x 1 vector v = (ui, • • • , v p ) T , ||v ||2 = \jY^=i V 1 * s the 
Euclidean norm, ||v||i = T)?=i I 1 ’*! the -^1 norm. For a matrix H = (hij), ||H||f = y / tr(H i 'H) 


is the Frobenius norm, ||H ||2 = \J A max (H r H) is the operator norm, where A max (-) is the largest 
eigenvalue of a matrix. We denote by |H| the matrix (\hij\) which is a matrix of the same size as 
H but with the (i, element h %3 replaced by \hij\. Note the determinant of H is denoted by 
det(H). A strictly stationary process {yf} is ct-mixing if 


a{k) = sup \P(A)P(B) — P(AB)\ —>• 0, as k —>• 00 , (7) 

where T\ denotes the er-algebra generated by (y t ,i < t < j}. See, e.g., Section 2.6 of Fan and 
Yao (2003) for a compact review of a-mixing processes. 

Let S(A 0 ) = I — D{ Ao)W be invertible. It follows from (JTJ) that 

y t = Ay t _i + S _ 1 (A 0 )ei, 


where A = S _1 (Ao)(-D(Ai) + D{ A2)W). Some regularity conditions are now in order. 

Al. The spatial weight matrix W is known with zero main diagonal elements; S(Ao) is invertible. 
A2. (a) The disturbance St satisfies 

Cov(y t _i,e t ) = 0. 


(b) The process {yt} in model (JT|) is strictly stationary and a-mixing with a(k), defined in 
0. satisfying 

OO 

^ a(fc) 3 +7 < 00 , 
k= 1 
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for some constant 7 > 0 . 

(c) For 7 > 0 specified in (b) above, 

I T 14+7 I 'T' .4+^ I rj-i - 4+7 

supE |Wj S 0 yt| <00, supE |Wj Eiy t | <00, supE|ej£ 0 yt| < 00, 
v v v 

I T 14+7 I 'T' 1 4+7 

sup E | Wj yt| < 00 , sup E |ej y*| < 00 , 

p p 

where w* denotes the i- th row of W. The diagonal elements of V* defined in ([ 8 |) are bounded 
uniformly in p. 

A3. The rank of matrix (5+Wj, Upe*, SgWj) is equal to 3. 

Remark 1 . Condition A1 is standard for spatial econometric models. Condition A3 ensures that 
Ao«,Aii and A 2 i are identifiable in Condition A2(c) limits the dependence across different 
spatial locations. It is implied by, for example, the conditions imposed in Yu et al. (2008). 
Lemma [I] in the Appendix shows that Condition A2 holds with 7 = 4 under conditions Al and 
B1 - B3 below. Note that conditions Bl- B3 are often directly imposed in the spatial econometrics 
literature including, for example, Lee and Yu (2010a), and Yu et al. (2008). 

Bl. The errors e t ,t are i.i.d across i and t with E(ej7 = 0, Var+*) = +, and E K + +7 < 00 . 
The density function of exists. 

B 2 . The row and column sums of |W| and |S _ 1 (Ao)| are bounded uniformly in p. 

B3. The row and column sums of l^l are bounded uniformly in p. 

Now we are ready to present the asymptotic properties for (Aoi, A^, A 2 i) T , i = 1,... ,p, with 
fixed p and n —> 00 first, and then p —>• 00 and n —> 00 . 


3.1 Asymptotics for fixed p 

For i = 1,... ,p, let 


and 


y ,ei(j) 

= Cov(yt_i + j£ ijt+ j,y t _i£ i)t ), 

3 = 0,1,2, 

OO 

Sy, £i = Sy,£ 4 (0) + £ [Sy , £i (j) + ^, £i (j)] , 

3 = 1 


1 wfSiEo ei 

wfEiE 0 w i ^ 

v i = 

wfSiSo e* efSoSoej 

efS 0 S 0 w, ; 


\ wfSiE 0 w i ef^oSow { 

wfs 0 s 0 wj y 


Ui 


( wfSiSy.e-Sfw i 

Wj 

V wfSiE yjei Eow i 


So51y i e i Soej 
e i ^0^y,£;^0 w i 


wfSiEy iei S 0 Wi ^ 

e, ^0^y,£i^0Wi 

wfE 0 Ey iej E 0 Wj y 


( 8 ) 


(9) 
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Theorem 1 Let conditions A1 - A3 hold and p > 1 be fixed. Then as n —>• oo, it holds that 


ffM,) 


^ Aoi \ ^ 

Mi 

- 

Mi 

V V A 2i y 


\ Mi ) ) 


N&V-'UiVfi 1 ), i = l,...,p, 


where V, and U* are given in (J8]) and 


3.2 Asymptotics with diverging p 

_ i 

When p diverges together with n, Uj, Vj in ([9]) and (j8j) are no longer constant matrices. Let Uj 2 
be a matrix such that (U- 2 ) 2 = U~ . 


Theorem 2 Let condition Al - A3 hold. 


(i) As n —>• oo, p —>• oo and p = o(y/n), 



( ( Mi ^ 


( a°A\ 

Aii 

— 

Aii 

V V A 2i y 


\Mi ) ) 


^N( 0 ,I 3 ), 




(ii) Ts n —> oo, p —>• oo, = O(p) and p = o(n), 


^ Aoi ^ 
Ai i 

V / 


^ Aoi ^ 
^li 

V / 


= o P (- 

\n 


i = p- 


Theorem 2 indicates that the standard root-n convergence rate prevails as long as p = o(y/n). 
However the convergence rate may be slower when p is of higher orders than y/n. Theorem 2 
presents the convergence rates for the L 2 norm of the estimation errors. The rates also hold for 
the L\ norm of the errors as well. Corollary 1 consider the estimation errors over p locations 
together, for which we have established the result for L\ norm only. 


Corollary 1 Let condition Al hold, and condition A2 and A3 hold for all i = 1, ■ ■ ■ ,p. Then as 
n —>• 00 and p —>• 00 , it holds that 


1 

P 


E 


^ Aoi ^ 
Ai i 

V / 


^ Aoi ^ 
Aii 

V / 


°p(n) if ^ ^ °° aIld n = °( 1 )' 


To derive the asymptotic properties of the estimators defined in (}5|) , we introduce some new 
notation. For i = 1, ... ,p, let 

So = Cov(y t , zj), = Cov(y t , zj^), 






























£ z i, Si {j) = Cov^.-^-e^+^z^e^), j = 0,1,2, 


and 


Let 


and 


OO 


s z%£i = S zSei (0) + [s z%£i (j) + Ej i£i (j) 

l=i 


/ 


V = 

l 


wfSi(Si) T w, wfSi(S*) T ei wfEi(S*) T Wi 
wfSi(S*) r ei efSj,(S*) r e 4 ef E‘ (E’fw* 
w^SUS^fw,; e?’Sf.fS^ T "'- 




w; w/s* 0 (i] l 0 ) J Wi / 


/ ,.Jvl 


u* = 


j \T,„ „„Ti 

z’,ei' '' 

Tvi v . CvML _7> 


w^E^E^w, 

ii \T, 


efE'E^S^e* ef^^lUorwi 


e^E 


:o) T 

- s nr 


\ 
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( 11 ) 


v wfE'E^E*) 7 ^ efE'E^E’fw, wfS { 0 S z , £i (S^ Wi / 

Theorem [3] below indicates that the estimators defined in ([5]) are asymptotically normal with 
the standard -y/ra-rate as long as di = o{y/n). Note that it does not impose any conditions directly 
on the size of p. 


A4. (a) For 7 > 0 specified in A2(b), 

supE |wf Sgz )| 4+7 < 00 , supE |wf E^zj | 4+7 < 00 , supE |ef Sqz £| 4+7 < 00 , 
p p p 

I T | 4 +7 . „| T 1 4 +7 . 

supE|Wjy t | < 00 , supE|ejy t | < 00 . 

p p 

and the diagonal elements of V* defined in (HOD are bounded uniformly in p. 

(b) The rank of matrix E{Z.;} is equal to 3, where Zj is defined in ©. 


Theorem 3 Let conditions Al, A2(a,b) and A 4 hold. As n —>• 00 , p —>• 00 and di = o(y/n), it 
holds that 



//A 0J \ 


^ Aoi \ ^ 

An 

- 

An 

V V A 2 i y 


\ A2i / / 


^N( 0 , 1 3 ), 


i = l,...,p, 


where V* and U* are given in m and m 


The key assumption of Theorem 2 is A2(c), which decides the fact that the effect of the dimen¬ 
sionality p only comes from E\ in equation (I13D in the Appendix. We can relax this assumption 
by allowing E 2 to be affected by p as well. Under the new relaxed assumption, we may obtain a 
better convergent rate of estimator Q by making use of the fact that Q is invariant if we divide 
both the numerator and denominator by the same number, for example, a number relating to p. 
This will be presented in Theorem 4. We propose the new relaxed assumption: 
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A5. For 7 > 0 specified in A2(b), 

max jsupE|wf£ 0 y t | 4+7 , sup E |wf £iy t | 4+7 , sup E \ef E 0 yt | 4+ ' ) = O(s 0 (p)). 

^ p p p ' 

max supE |w, t y t | , supE |e, : y t \ 1 = 0 (si(p)). 

^ P p ' 

and the diagonal elements of Vj defined in | 8 ]) is in the order of S 2 (p), where so(p), s\(p) 
and S 2 (p) are numbers relating to p. 


Denote C as a constant. When the number of nonzero elements (or elements bounded away 
from zero) in w,; increases with p but is o(p), we may have si (p) = o(min{so(p), S 2 (p)})- Simulation 
scenario 2 is under this case. When there are only finite number of nonzero elements (or elements 
bounded away from zero) in w !; we might have s\(p) x C, which is the case of simulation scenario 
1 . The reason we assume the diagonal elements of V,; defined in {(HJ) are in the order of S 2 (p) is 
because we can treat w i,ef SoEo e o wf So^o w ?. as the second moments of three random 

variables w j Six, +Sox and w J Sox respectively, where the p x 1 random vector x has mean 0 
and covariance matrix I p . 


Theorem 4 Let conditions Al, A2(a,b), A3 and A5 hold. As n —>• oo, p —> oo, if = o(n) 
and s l J 2 (p) = 0 (ps 1 / 2 (p)s 2 (p)), it holds that 


^ Aoi ^ 
Ai i 

V A2i / 


^ Aoi ^ 
Aii 

V A2i ) 


= Op | max 


3/4, 


( Psf ~(p) sl /4 {p) -j\ 

l ns 2 {p) 1 y/ns 2 (p) J 


Let us consider some examples. (1) When so(p) x p, s\(p) x C and s 2 (p) x p, the convergence 
rate is max | A ,J . (2) When s 0 (p) - P , «i(p) - y/P and s 2 (p) x p, if p = o(n 2 ), the 
convergence rate is max |, ^-A 3 y 4 j■ (3) When sq{p) x C, s\(p) x C and s 2 (p) x C, if 
p = o(n), the convergence rate is max|^,-^j, which corresponds with Theorem 2. Theorem 
4 indicates that under different situations of so(p), s±(p) and s 2 (p), we may obtain different 
convergence rates. These observations are illustrated by simulation examples in section 4. 


4 Simulation study 

To examine the finite sample performance of the proposed estimation methods, we conduct some 
simulation under different scenarios. 

4.1 Scenario 1 

Aoi, Aii and X 2 i are generated from U{— 0.6,0.6). The spatial weight matrix W used is a block 
diagonal matrix formed by a yfp X ypp row-normalized matrix W* . We construct W* such that the 
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first four sub-diagonal elements are all 1 and the rest elements are all 0 before normalizing. This 
kind of W corresponds to the pooling of p separate districts with similar neighboring structures 
in each district, see Lee and Yu (2013). The error are independently generated from iV(0, erf), 
where we generate each cr,, from C/(0.5,1.5). 

For all scenarios, we generate data from (2.1) with different settings for n and p. We apply 
the proposed estimation method (2.3) and (2.5) (with di = min (p, n 10 ' 21 )) and report the mean 
absolute errors: 

1 2 - i p 

MAE(i) = r V |A ji - Xji\, MAE = - V MAE(i). 

3 j=o p i =i 

We replicate each setting 500 times. 

Figure [2] depicts two boxplots of MAE with p equals to, respectively, 25 and 100. As the 
sample size n increases from 100, 250, 500, 750 to 1000, MAE decreases for both methods. 


left:estimator (2.3) and right: estimator (2.5), p=25 
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left:estimator (2.3) and right: estimator (2.5), p=100 
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Figure 2: Boxplots of MAE for estimator (2.3) (left panels) and estimator (2.5) (right panels) 
with p = 25 (top panels) and 100 (bottom panels), n = 100, 250, 500, 750, 1000 for scenario 1. 

Figure [3] depicts the boxplots of the MAE for the original estimator (2.3), the root n consistent 
estimator (2.5), and the estimator (2.5) with the ridge penalty, where we choose the ridge tuning 
parameter to be C x £ in order to avoid the nearly singularity problem of Z^Z^, and C is chosen 
via cross validation. With n = 500, the dimension p is set at 25,49,64,81,100,169,324 and 529 
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respectively. The MAE for (2.3) remains about the same level as p increases; see the panel on 
the left in Figure [3j This is in line with the asymptotic result of Theorem 4 when, for example, 
si(p) x C, so(p) x p and S 2 (p) x p. In contrast, the MAE for estimator (2.5) increases sharply 
when p increases; see the panel in the middle. This is due to the fact that Zj’Zj is nearly singular 
for large p. Adding a ridge in the estimator certainly mitigates the deterioration when p increases; 
see the panel on the right in Figure EJ 



estimator (2.3), n=500 


estimator (2.5), n=500 

ridge estimator of (2.5), n=500 
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Figure 3: Boxplots of MAE of the original estimator (2.3) (the left panel), the root n consistent 
estimator (2.5) (the middle panel), and the estimator (2.5) after adding ridge penalty (the right 
panel) with n = 500 and p = 25,49,64,81,100,169, 324, 529 for scenario 1. 

4.2 Scenario 2 

Ao i, Xu and A2 i are generated from U{— 0.6, 0.6). The spatial weight matrix W is constructed as 
follows. First, we construct a yfpx yjp row-normalized matrix W*, where W* is chosen such that 
the first two sub-diagonal elements are all 1 and the rest elements are all 0 before normalizing. 
Then we treat W as a yfp x yjp block matrix and put W* into the main diagonal, 2nd, 4th, 
6 th and etc. sub-diagonal block positions. This kind of W corresponds to the pooling of yfp 
districts (each district has yfp locations) which the evenly numbered districts are connected and 
the oddly numbered districts are connected but evenly numbered districts and oddly number 
districts are separated. Each district has similar neighboring structures. As p increases, the 
number of the locations influencing one specific location increases in the order of yfp. The error 
£i,t are independently generated from N(0,af), where we generate each a* from 1/(0.5,1.5). 

Figure [4] depicts two boxplots of MAE with p equals to, respectively, 25 and 100. As the 
sample size n increases from 100, 250, 500, 750 to 1000, MAE decreases for both methods. 

Figure [5] depicts three boxplots as Figure[3l The MAE for (2.3) increases steadily as p increases, 
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Figure 4: Boxplots of MAE for estimator (2.3) (left panels) and estimator (2.5) (right panels) 
with p = 25 (top panels) and 100 (bottom panels), n = 100, 250, 500, 750, 1000 for scenario 2. 

which matches the result of Theorem 4 when, for instance, si(p) x y/p, sq(p) x p and S2{p) x p. 
The MAE for (2.5) after adding ridge penalty is slowly increasing as well. This might be caused 
by the fact that, similar to A2(c), quantities in condition A4(a) is also influenced by p since the 
number of nonzero elements in w,; is in the order of yfp. 


5 Real data analysis 

5.1 European Consumer Price Indices 

We analyze the monthly change rates of the consumer price index (CPI) for the EU member states, 
over the years 2003-2010. We use the national harmonized index of consumer prices calculated 
by Eurostat, the statistical office of the European Union. For this data set, n = 96 and p = 31. 

Figure [6] presents the time series plots of the monthly change rates of CPI for the 31 states. 
To line up the curves together, each series is centered at its mean value in Figure [6l There exist 
clearly synchronizes on the fluctuations across different states, indicating the spatial (i.e. cross¬ 
state) correlations among different states. Also noticeable is the varying degrees of the fluctuation 
over the different states. 
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Figure 5: Boxplots of MAE of the original estimator (2.3) (the left panel), the root n consistent 
estimator (2.5) (the middle panel), and the estimator (2.5) after adding ridge penalty (the right 
panel) with n = 500 and p = 25,49,64,81,100,169, 324, 529 for scenario 2. 
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Figure 6: Time series plots of the monthly change rates of CPI for the 31 EU member states. Each series 
is subtracted by its mean value. 

Let y t consist of the monthly change rates of CPI for the 31 states. We fit the proposed 
spatial-temporal model dH) to this data set with the parameters estimated by (J3|). We take a 
normalized sample correlation matrix of y t as the spatial weight matrix W = ( Wij ), i.e. we let 
u>ij be the absolute value of the sample correlation between the i-th and j-th states for i ^ j, and 
wu = 0 , and then replace by w^/^2 k Wkj- 

Figure [7] presents the scatter plots of y % x against, respectively, the 3 regressors in model 
(HD, i.e. wjy t , y it _ i, wjy t _i, for four selected states Belgium, Greece, France and Iceland. We 
superimpose the straight line y = A ji x in each of those 3 scatter plots with, respectively, j = 0,1, 2. 
It is clear that the estimated slopes are very different for those 4 states. Figure [ 8 ] plots the true 
monthly change rates of the CPI for those 4 states together with the fitted values 

Vi,t = Aojwfyt + Xuyi,t-i + A 2 ,;wfy t _i. (12) 
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Figure 7: The scatter plots of yiy against wjyt (panels on the top), yiy -1 (panels in the middle), 
and w Jyt~i (panels on the bottom) for four selected countries Belgium, Greece, France and 
Iceland. The straight lines y = A jiX are superimposed in the panels on the top with j = 0, those 
in the middle with j = 1, and those on the bottom with j = 2. 

Overall y t t tracks its truth value reasonably well. Figure [0] shows the out-of-sample forecasting 
performance of our model. For the sake of comparison, predictions are made using our model and 
the proposed generalized Yule-Walker estimator, and using the (constant) SDPD model of Yu et 
al. (2008) and their Quasi-Maximum Likelihood estimator. In particular, for each location, we 
leave out from the sample the last six observations and we compute the (out-of-sample) forecasts 
with 1,2,....6 step ahead forecasting horizon; then, we compute the average prediction error over 
time (i.e. the mean of the 6 prediction errors). On the left panel of Figure [9j the two box-plots 
summarize the average prediction error for the 31 locations obtained with our YW estimator and 
the QML estimator of Yu et al. (2008), respectively. It is evident that our estimator produces 
unbiased predictions while the QML estimator appears to be biased. This advantage also reflects 
on the forecasting average square errors, reported on the right panel of Figure [9l In conclusion, 
the SDPD model of Yu et al. (2008) has a satisfying forecasting performance because several 
locations have similar spatial structure and for those locations a model with constant parameters 
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is sufficient. Anyway, a marginal improvement is observed for our estimator because several 
locations have quite different structures and our model is able to capture this difference. Finally, 
it is worthwhile to notice that the variability of the two predictors appears to be the same. 
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Figure 8: The monthly change rates of CPI (thin lines) of Belgium, Greece, France and Iceland, and their 
estimated values (thick lines) by model (H|). 
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Figure 9: Prediction errors generated in the out-of-sample forecasting, leaving out 6 observations from 
the sample, using our model with the Generalized Yule-Walker estimator and using the constant SDPD 
model of Yu et al. (2008) with the Quasi-Maximum Likelihood estimator. 

To further vindicate the necessity to use different coefficients for different states, we consider 
a statistical test for hypothesis 

Hq : Aji = • • • = A jp, j = 0 , 1 , 2 

for model ©• Then the residuals resulting from the fitted model under Hq will be greater than 
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the residuals without Hq. However if Hq is true, the difference between the two sets of residuals 
should not be significant. We apply a bootstrap method to test this significance. Let Ao,Ai,A 2 
be the estimates under hypothesis Hq. Define the test statistic 

1 n _ „ 

U = - } ||y* — y*||i, y t = A 0 Wy t + Aiy t _i + A 2 Wy t _i. 

t= 1 

We reject Hq for large values of U. To assess how large is large, we generate a bootstrap data 
from 

y* = AoWy t + Aiyt_i + A 2 Wy t _i + £*, 

where {s*} are drawn independently from the residuals 

£t = yt~ 9t, t = 1, - ,n, 

and yt consists of the components defined in (1121) . Now the bootstrap statistic is defined as 

1 n 

U* = -J2 II y*t - ( X *o W yt + * 1^-1 + A^Wyt_ 1 )|| 1 , 

n t=\ 

where (Aq, A}, A?l) is the estimated coefficients for the regression model 

y t = AoWy t + Aiy t _i + A 2 Wy t _i + e t , t = 1, • • • , n. 

The -P-value for testing hypothesis Hq is defined as 

P(U* > U\yi, ■ ■ ■ ,y n ), 

which is approximated by the relative frequency of the event (U* > U) in a repeated bootstrap 
sampling with a large number of replications. By repeating bootstrap sampling 1000 times, the 
estimated P-value is 0, exhibiting strong evidence against the null hypothesis Hq. Therefore the 
model with the equal slope parameters across different locations is inadequate for this particular 
data set. 

5.2 Modeling mortality rates 

Now we analyze the annual Italian male and female mortality rates for different ages (between 0 
and 104) in the period of 1950 - 2009 based on the proposed model (H|). The data were downloaded 
from the Human Mortality Database (see the website http://www.mortality.org/). Let be the 
log mortality rate of female or male at age i and in Year t. Those data are plotted in Figure [TOl 
Two panels on the left plot are the female and male mortality against different age in each year. 
More precisely the curves {m^, i = 1, • • • , 21} for t < 1970 are plotted in red, those for t > 1990 
are in blue, those with 1970 <t< 1989 are in grey. Those curves show clearly that the mortality 
rate decreases over the years for almost all age groups (except a few outliers at the top end). Two 
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Figure 10: Log mortality rates of Italian female (3 top panels) and male (3 bottom panels) are plotted 
against age from each year in 1950-2009 (2 left panels), against year for each age group between 0 and 104 
(2 middle panels). Differenced log mortality rates are plotted against year for each age in 2 right panels. 


panels in the middle of Figure fTOl plot the log mortality for each age group against time with the 
following color code: black for ages not great than 10, grey for ages between 11 and 100, and 
green for ages greater than 100. They indicate that the mortality for all age groups decreases over 
time, the most significant decreases occur at the young age groups. Furthermore, the fluctuation 
of the mortality rates for the top age groups reduces significantly over the years, while the mean 
mortality rates for those groups remain about the same. This can be seen more clearly in the two 
panels on the right which plot differenced log mortality rates {i/ij, t = 1951, • • • , 2009}, using the 
same colour code, where i. 

We fit the differenced log mortality data with model |1]) with the parameters estimated by 
([5]) and di = 20. Note that now p = 104 and n = 59. Let the off-diagonal elements of the spatial 
weight matrix W be 

Wij = ——--, 1 < i < j < 104. 

We then replace by Wij/Yli w ij- Moreover, we can also fix a threshold r and set to zero all 
the elements of matrix W such that \x — w\ > r (for simplicity, we fix r = 5 in this application, 
but the results are substantially invariant for different values of r). 

The results of the estimation are shown in table [T] for a selection of cohorts of different ages. 
Figure fill shows the fitted series for ages i = 60,80,100. 
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age 



^2 i 

age 

Ao i 

Mi 

Mi 

5 

0.41 

-0.52 

0.06 

55 

0.19 

-0.88 

0.28 

10 

0.20 

-0.42 

0.05 

60 

-0.09 

-0.72 

0.01 

15 

0.44 

-0.65 

0.18 

65 

0.22 

-0.63 

0.21 

20 

0.64 

-0.78 

0.40 

70 

0.21 

-0.69 

0.08 

25 

-0.04 

-0.43 

0.03 

75 

0.33 

-0.59 

0.22 

30 

0.78 

-0.80 

0.55 

80 

0.33 

-0.89 

0.27 

35 

0.11 

-0.55 

0.29 

85 

0.37 

-0.76 

0.18 

40 

-0.04 

-0.66 

-0.01 

90 

0.29 

-0.62 

0.16 

45 

0.29 

-0.46 

0.12 

95 

0.27 

-0.77 

0.26 

50 

-0.10 

-0.45 

-0.05 

100 

0.44 

-0.69 

-0.03 


Table 1: Estimated coefficients for a selection of cohorts of different ages. The left column is the estimated 
pure spatial coefficients Aoi; The middle column is the estimated pure dynamic coefficient Ah; The right 
column is the estimated spatial-dynamic coefficients A2 
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Figure 11: Observed time series (thin line) and fitted time series (bold line), for female mortality rate for 
ages i = 60,80,100. 

6 Final remark 

We propose in this paper a generalized Yule-Walker estimation method for spatio-temporal models 
with diagonal coefficients. The setting enlarges the capacity of the popular spatial dynamic panel 
data models. Both the asymptotic results and numerical illustration show that the proposed 
estimation method works well, although the number of the estimation equations utilized should 
be of the order o(y / n). 
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Appendix: Proofs 


We present the proofs for Theorems 2, Corollary 1 and Theorem 4 in this appendix. The proofs 
for Theorem 1 and 3 are similar and simpler than that of Theorem 2, and they are therefore 
omitted. We also present a lemma (i.e. Lemma 1) at the end of this appendix, which shows that 
condition A2 is implied by conditions A1 and B1 - B3; see Remark 1. We use C to denote a 
generic positive constant, which may be different at different places. 

Proof of Theorem 2. We first prove (i) of Theorem 2. We only need to prove the assertions 
(1) and (2) below, as then the required conclusion follows from (1) and (2) immediately. 


(1) 


( 7iEt=iyI-i( w Iyt)^J2t=i £ i,tyt-i \ 


x/nu: 


n r = i «/ t — 

1 T 
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1 V' vn T 1 sr^n 

n 2~jt= 1 yt-lVi,t-^n 2^t=l e i,tyt-l 

1 T ( T \ 1 sr^n 

\ n 22 t=iyt-l( w i.yt-l)nl 2 t=l £ i,tyt-l / 


N(0,h). 


(2) v^xfx,)- 1 A i 3 . 


To prove (1), it suffices to show that for any nonzero vector a = (ai,a2,a3) T , the linear 
combination 

/ ^J2'?=iyf-i(™Tyt)^Et=i £ i,tyt-i ^ 

0 T i T 1 x~' vn 

a n 2^4=1 yt-lVht-ln 2^=1 £ i,tyt-l 

\ nz2t=iyi-i{ w iyt-i)nz2t=i £ i,tyt-i 

is asymptotic normal. 

Let us take out the dominant term in ^ )T)” =1 yf-ifaTyt)^ Y^t= 1 £ i,tyt-i first. 
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For term E± and k = 1,2, ■ ■ ■ ,p, by Proposition 2.5 of Fan and Yao (2003), we have 


E 
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1 ^ 1 
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t= 1 


t^S 
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where C is independent of p. Then it holds that 

\ - elsf Wi ) = o p (-^= 

Therefore 
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t=l 


n 


n 
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= Or, 


Similarly, 


1 n 

— / . £i,tyt—i 
n 


t =l 


= Op(t r)- 


Since £+ < ||^ X^=l yt-i^Jyt - Efw,;|| 2 ||^ X^=i 11| 2 , it holds that £i = O p (£). Similar 

to flUD , we have Yar(y/n,E- 2 ) = 0(1)- Given = o(l), it holds that y/nEi = o p (l). Hence if 
P = o{y/n), 


^ n I* 6 I' 6 

A X - V yj -1 (wf y 4 )- V £»,tyt -1 = ^VwfSiyt —1 £i,t + O p (l). 

n n Jn 

t= 1 i=l v i=l 

Similarly, given p = o(y/n ), we have 
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Now it suffices to prove 
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is asymptotic normal. 
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Note that it holds that 


E|wfEiy 4 _i£ M | 2+ ^ < [E|wfS 1 y t _ 1 | 4+ ^]5[E|e M | 44 ^]3 < oo. 


Now we calculate the variance of S ntP . It holds that 
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ming up them, it follows from dominate convergence theorem that 


and sup p Y.%i l w f s i S y, E i(i) s oe i\ < oo. Calculating all the variance and covariance and sum- 
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To prove the asymptotic normality of S n pi we employ the small-block and large-block argu¬ 
ments. We partition the set {1,2, • • • , n} into 2 k n + 1 subsets with large blocks of size l n , small 
blocks of size s n and the last remaining set of size n — k n (l n + s n ). Put 

In = [%/n/logn], s n = [v/nlogn] 1 , k n = [n/(l n + s n )\, 

where tt— < x < 1. Hence 

In/y/n^- 0, Sn/ln ^ 0, k n = 0 (y/n log n). 

Note that l n /y/n —>• 0 is important when we derive the Lindeberg condition of the truncated 
partial sum T^ p defined in (fl6l) . 

7 4+7 

Since Yl'jLi 01 O') 477 < oo, we have a(s n ) = o(s n 7 )• It then holds that 
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Then we can partition S niP in the following way 
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Note that a(n) = o(n 2(2+ 7 /2-2)) anc | k n s n /n —>• 0, (l n + s n )/n —>• 0, by applying proposition 2.7 
of Fan and Yao (2003), it holds that 
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We calculate the variance of T n p . Similar to (fl5l) . it holds that 
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Calculating all the variance and covariance and summing up them, by dominated convergence 
theorem and —>■ 1, it holds that 
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Now it suffices to prove the asymptotic normality of T UjP . We partition T np into two parts via 
truncation. Specifically, we define 
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Similar to computing the Var (T UiP ), it holds that 
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where we denote a\ as the asymptotic variance of T^ p . Similarly, we have 
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V 4a T U t a, 


+ 


+ 


+ 


/ \ 


-1 




'J Eexp 
i=i 


V^Ula 

, 7 /V _w(i)£ , _ _W(2)£ , _ _Lt(3)iV 
lt W 1 vYiS + “ 2 4ii4/ +a3 V^9 J 


^a T U if 


t z a. 


6XP l ■- 2 TO 


exp 


07 


2 a T U,,a 


6X P ' ~~2 


Following the same arguments as part 2.7.7 of Fan and Yao (2003), for any e > 0, it holds that 
M np < e as n,p —>• oo. Hence 


n x a 


( s Er=r yf-i ( w fy*)s Er=r ^.tyt-i 

T l v-^ 77 T 1 V"' 77 

77 , 2—ut =i —i Uift i yi 2^t=i htyt i 
1 sr^n T ( T \ 1 sr^n 

\ n z2t= i y*-i( w i yt-i)s Et=i e h*y*-i 


\ 

/Va T U ia 4 JV(0,1). 


/ 


_ 1 

Substituting a by (U- 2 ) T a, it holds that 



/ 

i 

( TiEt=iyt-i( w fyt)^Et=i £ i,tyt-i ^ 


T 

a 1 

v^u” 2 

1 V' v77 T 1 v^ n _ 

n 2^t=l y*—In 2^t=l e i,tyt-l 




1 v^ 77 T ( T \ 1 v^ 77 

\ „ E<=iyf-i( w * y<-i)nEt=i £ i,iyt-i y 



a T iV (0,1 3 ), 


which leads to the fact that 



1 n E"=i yf-i ( w f yt)£ Et=i £ i,t y*-i 

1 V'' n T 1 V"' 77 _ 

n Z^t=l yt-lJ/ijt-ln 2^t=l e i,tyt-l 

1 sr^n T ( T \ 1 sr^n 

\ n Et=i yt-i( w * yt-i)n Et=i e *,tyt-i 


\ 

4 v(o,i 3 ). 
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To prove (2), let us look at the (1, l)-th element of X ? T X,. We have 

-t n i n 

- 5^y?Li(wfyt)- j^yi-i(wfyi) 
n z —' n z —' 

t=i t=i 

= ^5^yf-i( w fyt)- ^^yt-i(wfyj) - xfw 
+ 2wfXi (- ^yi-i(wfyt) - Xfw^ +wfS 1 Sfw i . 


( 17 ) 


t=i 


Using the same arguments as (USD, the first term is O p (^) and the second term is O p {-^=). Hence 
given p = o(n), it holds that 

k £?= i yti « yt)k £?=i yt-i(wfy t ) 


w/ X | X • w, 


1. 


Applying the same arguments to the other elements of X 2 X,;, it holds that 

Vi(XfXj) -1 I 3 . 

To prove (ii) in Theorem 2, the required asymptotic result follows from ()13l) and (1171) imme¬ 
diately when p = o(n) and y/n = 0(p). The proof is completed. □ 

Proof of Corollary 1. By Theorem 2, it holds that 

^ Aoi \ ( Aoi ^ 

Aii 

V ) 


Ah 

V As * ) 


= lf % ~ 

°p(n) if -fa °° and n = °( 1 )' 


\ / \ / 1 

for all i. The required asymptotic result follows from the above result directly. 
Proof of Theorem 4. Let us look at term E\ and E 2 in (1131) first under the ~' 
Similar to the proof of (fl4l) . it holds that 


□ 


new condition (A5). 


Hence 


Similarly, we have 


El=0p( ^El), a = o p( £^M). 

n y/n 


- Xyr-.fwfy,)! = o r e~^El + 

n n n y/n 


1 y- T 1 A n 5 o / 4 (i>) > 

t =1 t =1 v 
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For the first diagonal element of X^Xj, it follows from considering the three terms in (1171) 
separately that 


-t n n 

-£y^-i(wryt)-£yt-i(wfy t ) = o p ( 

n z — J n z — J 


PSl(p) , s 0 /4 (p) S 1 /4 (p)n , T v WT. 


+ 


t=l 


t=l 


n 




+ w) XiXf w i 


Similarly, 


1 T 1 r\ t P s ^P) , 


t =i 


i=l 


P»1 (P) , s 0 /4 (p) s 1 /4 (p)n . Tv vTT 
— + --) + e 4 X 0 S 0 e 4 , 


“ X^y^-i(wfyt-i)^- ^yt-i(wfyt-i) = O p ( + 5 ° ^ ) + wfS 0 So w »- 


t=l 


t=l 


Given psi ffi = o(n) and — 4 ^——— = 0 ( 1 ), we have 

82 (p) v ’ p^ /2 (p) S2 (p) v 

p«i(p) / , y* •So /4 (p) s i /4 (^) f t W 

- = o(s 2 (p)), — -7=-= os 2 (p ■ 

n V n 

Divide both the numerator and denominator of estimator (J 3 J) by s 2 (p), it holds that 


^ Ao i ^ 
An 

V / 


^ Aoi ^ 
Ah 

V A2 ' / 


psi /4 (p) + ^“(p) \ 

V ns 2 (p ) y/ns 2 {p))' 


1/4 / 


= o r , 


The required result then follows directly. 


□ 


Lemma 1 Under conditions A1 and B1 - B3, condition A2 holds with 7 = 4. 

Proof. It is apparent that part (a) of A2 is satisfied under A1 and B1 - B3. y t is strictly 
stationary because are i.i.d across i and t and condition B3. Since the density function of £,.t 

7 

exists, a(n) decays exponentially fast, see Pham and Tran (1985). Therefore a(j) 4 +t < 00. 

Now we prove A2(c) when 7 = 4 . 

We present a more general result first: for any px 1 vector a satisfying sup ? , ||a|| 1 < 00, it 
holds that 

„ I t 18 

supF [a y t \ < 00. 

p 

Note that 

00 00 

yt = Yl A^S-HAp )e t -h = J2 B hSt-h. 

h =0 h =0 
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Then 


E|a r y t | 8 = E 


Y a T B h £ t . 


—h 


h =0 


= E 


ZXe t 


—h 


h —0 


= E Y (^-,1 b ^i h h 2 £ t-h 2 ) (ej- h3 bh 3 b l A e t - hi ) (ef_ h5 h h5 b£ 6 e t _ K ){ej_ h7 e t _ h8 ) 

/ll ,/l2 j ^-3 j^-4 ,^-5 j Hq , hy , /l8=0 

OO P p 

E ( E [b/i 1 b^ 2 ]j i:? 1 £j 1 ) t_/j 1 £j 1 ) t_/ ! , 2 ^ ^ ^ ] [b/i 3 b fe 4 ]j 2 j 2 £j 2 ) t_/j 3 £j 2 ) t _/ l4 

hi,h 2 ,h 3 ,h4,h 5 ,h 6 ,h7,hg=0 ii,ji=l «2,92=1 

P P 

X ( y! [bfesb^gjigJgEig^—^gEjg^—feg^ ^ [^7 b ftg ] j 4 j 4 £ j 4 ^-/^ £ j 4 ^^ 


*3 J3 = l 


*4 J4=l 


=E 


'y 1 ^ 1 [b/iib/ l 2 ]j 1 j 1 [b/i 3 b^ 4 ]j 2 j 2 [b/j 5 b^g]j 3 j 3 [b/ l 7 b^ g ]j 4 j 4 

hl,h 2 ,h3,h4,hs,h6,hr,hs=0 h ,9l>*2,92,*3 J3,*4 ,94 = 1 


X ^ii ,t—hi^ji,t—h 2 ^-Z 2 ,t—h 3 £j 2 ,t—h4^i3,t—hs £ 73 ,t—he^ 14 ,t—hj£j4,t—hs 
OO p 

S E E 

h 3 ,h 2 ,h 3 ,h4,h 5 ,h & ,h 7 ,h 8 =G H ,9i,*2,92,*3 ,93,*4,94 =1 

X ^ J \^il,t—hi^ji,t—h 2 ^i 2 ,t—h3^j 2 ,t—h4^i3,t—h3^j3,t—he^i4,t—h 7 ^j4,t~ 


[bh 1 b^ 2 ]j 1 j 1 [b/i 3 bjj 4 ] j 2 j 2 [b^b ?l6 ] *3J3 [b/i 7 b/j g ] 

*494 


£ c E E 

^1 ,h2,h3,h4,h5,h6,h7,h8=0 il ,j 1 ,«2 J 2 ,23 J3 ,24 J4=l 

OO OO P P 

=^[EEEEn» 

ft=0 9=0 i=l j=l 


/J1 - . rj^\ rj~i rj~t 

b/iib /, 2 li^'j |b/! 3 b ^ 4 |j 2 j 2 |b^ 5 b^g |j 3 j 3 |b^ 7 b^ g |j 4 j 4 


Ti 

9 


(18) 


And 


/i=0 (7=0 2— 1 j=l h =0 g =0 2=1 j=l 2=1 j=l /i=0 g=0 

p p 00 00 p p 00 00 

EE(Ei b ‘iEi b H) y = EE(Ei b 4 (Ei b »i),. 

i=l j=l ft=0 9=0 1 i =1 j=l h =0 9=0 

p OO p OO 

E(EM,E(EM 


*9 


(19) 


i=l h=0 9 = 1 9=0 

where (E£1 oM). is the i-th element of the column vector E/Eo |b/i|- 

Since (EE=o |Bh|) 0 - = EE=o (| S_1 (A 0 ) |)-^ < (EE=o | A E | s_1 (A 0 ) |) ij - where the row and 
column sums of E/Eo | Ah | | 1 (Ao) | are bounded uniformly in p, it holds that the row and 

column sums of E/Eo l B h| ar e bounded uniformly in p. Note that 


OO OO OO 

(EM, = (Ei B ? a iE(E 


ll a l i .) 


h =0 


h =0 


h =0 
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where the row and column sums of YlhLo \ ^h \ an d l a l are bounded uniformly in p. Hence the 
row and column sums of YlhLo l-^I ll a l are bounded uniformly in p. It follows from (118 |) and ( 1191 ) 


that 


supE |a r y t | 8 < C 

p 


p oo p oo 

[E(SN),E(Ew) f 

A —i i—n i «—n J 


Oil). 


It is easy to prove that 


sup || Ho w ? ;||i < oo, sup||H^Wj||i < oo, sup ||Hoe,.||i < oo. 
p p p 

Thus sup p ||w ? ;Hoyt||i < oo and etc. 

The row and column sums of Hq and Hi are bounded uniformly in p. Then 


sup wj H] w,; = 0(1). 
p 

Similarly, we can prove the other diagonal elements of Vj and Uj are bounded uniformly in p. 
The proof is completed. □ 
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